Skip to content

Conversation

@aseembits93
Copy link
Contributor

@aseembits93 aseembits93 commented Sep 5, 2025

PR Type

Enhancement


Description

  • Add AI-powered candidate ranking endpoint

  • Compute unified diffs for ranking

  • Improve refinement logging conditionally

  • Fallback to heuristic ranking on failure


Diagram Walkthrough

flowchart LR
  A["Collect valid candidates"] --> B["Compute diff strings & speedups"]
  B --> C["Call AI /rank with diffs/speedups"]
  C -- "success" --> D["Use returned ranking"]
  C -- "failure/None" --> E["Fallback heuristic ranking"]
  D --> F["Select best candidate"]
  E --> F
Loading

File Walkthrough

Relevant files
Enhancement
aiservice.py
Add AI service ranking request handler                                     

codeflash/api/aiservice.py

  • Add generate_ranking API method.
  • Build payload with diffs, speedups, ids.
  • Handle request/response, errors, timeouts.
  • Log telemetry and return ranking.
+53/-0   
code_utils.py
Introduce unified diff string utility                                       

codeflash/code_utils/code_utils.py

  • Add unified_diff_strings helper.
  • Return unified diff between two strings.
+17/-0   
function_optimizer.py
Integrate AI ranking into candidate selection                       

codeflash/optimization/function_optimizer.py

  • Use unified diffs, speedups, ids for ranking.
  • Call AI ranking asynchronously with fallback.
  • Adjust index from 1-based to 0-based result.
  • Log refinement additions only when non-empty.
+36/-8   

@github-actions
Copy link

github-actions bot commented Sep 5, 2025

PR Reviewer Guide 🔍

(Review updated until commit 95114bd)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Index Handling

The AI ranking appears to be 1-based and is converted to 0-based via [x - 1 for x in ranking], but there’s no validation that the returned indices map correctly to the local candidate list length/order. If the service returns unexpected values (e.g., out-of-range or empty), this could raise or pick the wrong candidate.

if ranking:
    ranking = [x - 1 for x in ranking]
    min_key = ranking[0]
else:
    diff_lens_ranking = create_rank_dictionary_compact(diff_lens_list)
    runtimes_ranking = create_rank_dictionary_compact(runtimes_list)
    # TODO: better way to resolve conflicts with same min ranking
    overall_ranking = {key: diff_lens_ranking[key] + runtimes_ranking[key] for key in diff_lens_ranking.keys()}  # noqa: SIM118
    min_key = min(overall_ranking, key=overall_ranking.get)
Blocking Wait

The code always waits on the ranking future with concurrent.futures.wait([future_ranking]) and then .result() without a timeout or fallback at the wait level; consider using a timeout or as_completed to avoid potential indefinite blocking if the request hangs despite internal timeout.

concurrent.futures.wait([future_ranking])
ranking = future_ranking.result()
if ranking:
    ranking = [x - 1 for x in ranking]
    min_key = ranking[0]
Diff Formatting

unified_diff_strings uses lineterm="", which removes newlines between diff lines; some consumers expect newline-terminated hunks. Verify downstream parsing and consider defaulting to newline-terminated lines for compatibility.

diff = difflib.unified_diff(code1_lines, code2_lines, fromfile=fromfile, tofile=tofile, lineterm="")

return "".join(diff)

@github-actions
Copy link

github-actions bot commented Sep 5, 2025

PR Code Suggestions ✨

Latest suggestions up to 95114bd
Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Validate external ranking indices

Validate the AI service ranking indices to prevent out-of-range or negative values
after converting to 0-based. Fallback to the local heuristic if the ranking is
empty, mismatched in length, contains duplicates, or any index is invalid.

codeflash/optimization/function_optimizer.py [711-719]

 ranking = future_ranking.result()
-if ranking:
-    ranking = [x - 1 for x in ranking]
-    min_key = ranking[0]
+use_ai_ranking = (
+    isinstance(ranking, list)
+    and len(ranking) == len(valid_candidates_with_shorter_code)
+    and all(isinstance(x, int) for x in ranking)
+)
+if use_ai_ranking:
+    zero_based = [x - 1 for x in ranking]
+    valid_indices = set(range(len(valid_candidates_with_shorter_code)))
+    use_ai_ranking = (
+        len(set(zero_based)) == len(zero_based)
+        and all(x in valid_indices for x in zero_based)
+    )
+if use_ai_ranking:
+    min_key = zero_based[0]
 else:
     diff_lens_ranking = create_rank_dictionary_compact(diff_lens_list)
     runtimes_ranking = create_rank_dictionary_compact(runtimes_list)
-    # TODO: better way to resolve conflicts with same min ranking
     overall_ranking = {key: diff_lens_ranking[key] + runtimes_ranking[key] for key in diff_lens_ranking.keys()}  # noqa: SIM118
     min_key = min(overall_ranking, key=overall_ranking.get)
Suggestion importance[1-10]: 8

__

Why: Adds robust validation for AI-provided indices, preventing crashes or misselection if the service returns invalid data; aligns with the new ranking flow and cleanly falls back to local heuristics.

Medium
Validate AI ranking payload

Validate the response payload to ensure ranking is a list of integers before
returning it. Log and return None when the structure is invalid to prevent
downstream crashes.

codeflash/api/aiservice.py [396-399]

 if response.status_code == 200:
-    ranking: list[int] = response.json()["ranking"]
+    data = response.json()
+    ranking = data.get("ranking")
+    if isinstance(ranking, list) and all(isinstance(x, int) for x in ranking):
+        console.rule()
+        return ranking
+    logger.error(f"Invalid ranking payload: {data}")
+    ph("cli-optimize-error-response", {"response_status_code": response.status_code, "error": "invalid ranking payload"})
     console.rule()
-    return ranking
+    return None
Suggestion importance[1-10]: 7

__

Why: Ensures the response contains a proper integer list before returning, reducing downstream errors; fits the new endpoint handling with a clear fallback to None.

Medium
General
Sanitize speedup values

Guard against non-finite or negative speedup values to avoid sending misleading
inputs to the ranking service. Clamp to a minimum of 1.0 and replace NaN/inf with
1.0 to represent no improvement.

codeflash/optimization/function_optimizer.py [694-699]

-speedups_list.append(
-    1
-    + performance_gain(
-        original_runtime_ns=original_code_baseline.runtime, optimized_runtime_ns=new_best_opt.runtime
-    )
+gain = performance_gain(
+    original_runtime_ns=original_code_baseline.runtime,
+    optimized_runtime_ns=new_best_opt.runtime,
 )
+speedup = 1 + (gain if gain is not None else 0)
+# Ensure finite, non-negative speedup (at least 1.0 meaning no speedup)
+if not isinstance(speedup, (int, float)) or not math.isfinite(speedup) or speedup < 1.0:
+    speedup = 1.0
+speedups_list.append(speedup)
Suggestion importance[1-10]: 6

__

Why: Reasonable defensive check to avoid sending NaN/inf/invalid speedups to the service; moderate impact and requires importing math, but conceptually sound.

Low

Previous suggestions

Suggestions up to commit 3cb5a2d
CategorySuggestion                                                                                                                                    Impact
Possible issue
Await Future and validate result

You are using self.executor.submit(...) but never call .result(), so ranking is a
Future and list operations will fail at runtime. Retrieve the result with a timeout
and handle None or exceptions gracefully. Also validate the result type before
indexing.

codeflash/optimization/function_optimizer.py [700-715]

-ranking = self.executor.submit(
+future = self.executor.submit(
     ai_service_client.generate_ranking,
     diffs=diff_strs,
     optimization_ids=optimization_ids,
     speedups=speedups_list,
     trace_id=self.function_trace_id[:-4] + exp_type if self.experiment_id else self.function_trace_id,
 )
-ranking = [x - 1 for x in ranking]
-if ranking:
-    min_key = ranking[0]
+try:
+    ranking = future.result(timeout=70)
+except Exception:
+    ranking = None
+if isinstance(ranking, list) and ranking:
+    # service returns 1-based indices; convert to 0-based safely
+    zero_based = [x - 1 for x in ranking if isinstance(x, int)]
+    min_key = zero_based[0] if zero_based else None
 else:
+    min_key = None
+if min_key is None:
     diff_lens_ranking = create_rank_dictionary_compact(diff_lens_list)
     runtimes_ranking = create_rank_dictionary_compact(runtimes_list)
     # TODO: better way to resolve conflicts with same min ranking
     overall_ranking = {key: diff_lens_ranking[key] + runtimes_ranking[key] for key in diff_lens_ranking.keys()}  # noqa: SIM118
     min_key = min(overall_ranking, key=overall_ranking.get)
Suggestion importance[1-10]: 9

__

Why: The code assigns the Future from self.executor.submit(...) to ranking and then treats it as a list, which would fail; adding .result() with error handling is correct and critical. The improved code accurately reflects the fix and preserves the fallback path if the service fails.

High
Guard index normalization and bounds

Blindly subtracting 1 can produce negative indices if the service returns 0-based
indices or invalid values, leading to wrong selection. Clamp to valid range and
ignore out-of-bounds entries before choosing.

codeflash/optimization/function_optimizer.py [707-710]

-ranking = [x - 1 for x in ranking]
-if ranking:
-    min_key = ranking[0]
+if isinstance(ranking, list):
+    zero_based = []
+    for x in ranking:
+        if isinstance(x, int):
+            idx = x - 1
+            if 0 <= idx < len(valid_candidates_with_shorter_code):
+                zero_based.append(idx)
+    if zero_based:
+        min_key = zero_based[0]
Suggestion importance[1-10]: 7

__

Why: Validates the assumption about 1-based indices and prevents negative or out-of-range indices, improving robustness. Moderate impact since it depends on service contract but reduces potential runtime errors.

Medium
General
Use unshifted speedup metric

performance_gain likely returns a ratio; adding 1 shifts the baseline and can
mislead the ranker. Pass the actual speedup (baseline/optimized) or ratio
consistently with the backend expectation to avoid incorrect ordering.

codeflash/optimization/function_optimizer.py [692-697]

 speedups_list.append(
-    1
-    + performance_gain(
-        original_runtime_ns=original_code_baseline.runtime, optimized_runtime_ns=new_best_opt.runtime
+    performance_gain(
+        original_runtime_ns=original_code_baseline.runtime,
+        optimized_runtime_ns=new_best_opt.runtime,
     )
 )
Suggestion importance[1-10]: 5

__

Why: The concern about adding 1 to the performance_gain output may be valid but depends on backend expectations not shown in the diff. It’s a reasonable improvement suggestion but uncertain in correctness without more context.

Low

@aseembits93 aseembits93 marked this pull request as ready for review September 11, 2025 20:59
@github-actions
Copy link

Persistent review updated to latest commit 95114bd

Parameters
----------
- source_code (str): The python code to optimize.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO : change the docstring

@aseembits93
Copy link
Contributor Author

@misrasaurabh1 ready to merge

@misrasaurabh1 misrasaurabh1 merged commit 2802ae6 into main Sep 12, 2025
18 of 21 checks passed
@aseembits93 aseembits93 deleted the ranker branch September 22, 2025 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants